A Comparative Study of the Parallel Performance of the Blocking and Non-Blocking MPI Communication Commands on an Elliptic Test Problem on the Cluster tara

نویسندگان

Hafez Tari

Matthias K. Gobbert

چکیده

In this report we study the parallel solution of the elliptic test problem of a Poisson equation with homogenous Dirichlet boundary conditions in a two dimensional domain. We use the finite difference method to approximate the governing equations with a system of N linear equations, with N the number of interior grid points in either spatial direction. To parallelize the computation, we distribute blocks of the rows of the interior mesh point values among the parallel processes. We then use the iterative conjugate gradient method featured with a so-called matrix-free implementation to solve the system of linear equations local to any of the processes. The conjugate gradient method initiates with local vectors of zero elements, as the start solution, and updates the successive solutions until the Euclidean norm of the global residual of the local iterative solutions relative to that of the global residual of the local start solutions vanishes based on a predefined tolerance. To achieve this and considering the fact that the conjugate gradient method forces some communication between the neighboring processes, i.e. the processes possessing data of the grid interfaces, two modes of MPI communications, namely blocking and non-blocking send and receive, are employed for the data exchange between the processes. The obtained results given accordingly show excellent performance on the cluster tara with up to 512 parallel processes when using 64 compute nodes, especially once non-blocking MPI commands are used. The cluster tara is an IBM Server x iDataPlex purchased in 2009 by the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). It is an 86-node distributed-memory cluster comprised of 82 compute, 2 develop, 1 user, and 1 management nodes. Each node features two quad-core Intel Nehalem X5550 processors (2.66 GHz, 8 MB cache), 24 GB memory, and a 120 GB local hard drive. All nodes and the 160 TB central storage are connected by an InfiniBand (QDR) interconnect network.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Message-Passing Distributed Memory Parallel Algorithm for a Dual-Code Thin Layer, Parabolized Navier-Stokes Solver

In this study, the results of parallelization of a 3-D dual code (Thin Layer, Parabolized Navier-Stokes solver) for solving supersonic turbulent flow around body and wing-body combinations are presented. As a serial code, TLNS solver is very time consuming and takes a large part of memory due to the iterative and lengthy computations. Also for complicated geometries, an exceeding number of grid...

متن کامل

A Comparative Study of the MPI Communication Primitives on a Cluster

MPI (Message Passing Interface) has become the de facto standard for implementing parallel programs on distributed systems. In MPI, the two basic communication primitives are pointto-point communication and broadcast respectively. In this paper, we evaluate and compare the performance of broadcast with point-to-point communication (both blocking and non-blocking) of the MPI-1 standard library o...

متن کامل

A Two-Threshold Guard Channel Scheme for Minimizing Blocking Probability in Communication Networks

In this paper, we consider the call admission problem in cellular network with two classes of voice users. In the first part of paper, we introduce a two-threshold guard channel policy and study its limiting behavior under the stationary traffic. Then we give an algorithm for finding the optimal number of guard channels. In the second part of this paper, we give an algorithm, which minimizes th...

متن کامل

An Effective Hybrid Genetic Algorithm for Hybrid Flow Shops with Sequence Dependent Setup Times and Processor Blocking

Hybrid flow-shop or flexible flow shop problems have remained subject of intensive research over several years. Hybrid flow-shop problems overcome one of the limitations of the classical flow-shop model by allowing parallel processors at each stage of task processing. In many papers the assumptions are generally made that there is unlimited storage available between stages and the setup times a...

متن کامل

A population-based algorithm for the railroad blocking problem

Railroad blocking problem (RBP) is one of the problems that need an important decision in freight railroads. The objective of solving this problem is to minimize the costs of delivering all commodities by deciding which inter-terminal blocks to build and by specifying the assignment of commodities to these blocks, while observing limits on the number and cumulative volume of the blocks assemble...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

A Comparative Study of the Parallel Performance of the Blocking and Non-Blocking MPI Communication Commands on an Elliptic Test Problem on the Cluster tara

نویسندگان

چکیده

منابع مشابه

A Message-Passing Distributed Memory Parallel Algorithm for a Dual-Code Thin Layer, Parabolized Navier-Stokes Solver

A Comparative Study of the MPI Communication Primitives on a Cluster

A Two-Threshold Guard Channel Scheme for Minimizing Blocking Probability in Communication Networks

An Effective Hybrid Genetic Algorithm for Hybrid Flow Shops with Sequence Dependent Setup Times and Processor Blocking

A population-based algorithm for the railroad blocking problem

عنوان ژورنال:

اشتراک گذاری